Vanishingly Sparse Matrices and Expander Graphs, 
with application to compressed sensing 
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Abstract — We consider a probabilistic construction of sparse 
random matrices where each column has a fixed number of 
nonzeros whose row indices are drawn uniformly at random. 
These matrices have a one-to-one correspondence with the adja- 
cency matrices of fixed left degree expander graphs. We present 
formulae for the expected cardinality of the set of neighbors for 
these graphs, and present tail bounds on the probability that 
this cardinality will be less than the expected value. Deducible 
from these bounds are similar bounds for the expansion of the 
graph which is of interest in many applications. These bounds are 
derived through a more detailed analysis of collisions in unions 
of sets using a dyadic splitting technique. The exponential tail 
bounds yield the best known bounds for the i\ norm of large 
rectangular matrices when restricted to act on vectors with few 
nonzeros; this quantity is referred to as the ii norm restricted 
isometry constants (RICi) of the matrix. These bounds allow for 
quantitative theorems on existence of expander graphs and hence 
the sparse random matrices we consider and also quantitative 
compressed sensing sampling theorems when using sparse non 
mean-zero measurement matrices. 

Index Terms — Algorithms, compressed sensing, signal process- 
ing, sparse matrices, expander graphs. 



I. Introduction 

Sparse matrices are particularly useful in applied and com- 
putational mathematics because of their low storage complex- 
ity and fast implementation as compared to dense matrices, 
see (TJ, 12, |3l . Of late, significant progress has been made to 
incorporate sparse matrices in compressed sensing, with 0, 
0, ©, Q giving both theoretical performance guarantees and 
also exhibiting numerical results that shows sparse matrices 
coming from expander graphs can be as good sensing matrices 
as their dense counterparts. In fact, Blanchard and Tanner |8| 
recently demonstrated in a GPU implementation how well 
these type of matrices do compared to dense Gaussian and 
Discrete Cosine Transform matrices even with very small fixed 
number of nonzeros per column (as considered here). 

In this manuscript we consider random sparse matrices that 
are adjacency matrices of lossless expander graphs. Expander 
graphs are highly connected graphs with very sparse adjacency 
matrices, a precise definition of a lossless expander graph is 
given in Definition 11.11 and their diagrammatic illustration in 
Figure Q] 

Definition 1.1: G — (U, V, E) is a lossless (k, d, e)- 
expander if it is a bipartite graph with \U\ — N left vertices, 
|V| = n right vertices and has a regular left degree d, such 
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that any X C U with \X\ < k has \T(X)\ = (l-e)d\X\ 
neighbors. Q 

Remark 1.2: 1) The graphs are lossless because e -C 1; 

2) They are called unbalanced expanders when n <C N; 

3) The expansion of a lossless (k, d, e)-expander graph is 
(l-e)d. 

G(U,V,E) 



x<=u 

|X|<k 




" I(X) 



Fig. 1, Illustration of a lossless (k, d, e)-expander graphs with k 
d = 2. 
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Such graphs have been well studied in theoretical computer 
science and pure mathematics and have many applications 
including: Distributed Routing in Networks, Linear Time 
Decodable Error-Correcting Codes, Bitprobe Complexity of 
Storing Subsets, Fault-tolerance and a Distributed Storage 
Method, and Hard Tautologies in Proof Complexity, see IfTOll 
or (9) for a more detailed survey. Pinsker and Bassylago ifTTI 
proved the existence of lossless expanders and showed that any 
random left-regular bipartite graph is, with high probability, 
an expander graph. Probabilistic constructions with optimal 
parameters n, N exist but are not suitable for the applications 
we consider here. Deterministic constructions only achieve 
sub-optimal parameters, see Guruswami et. al. fT2l . 

Our main contribution, is the presentation of quantitative 
guarantees on the probabilistic construction of these objects 
in the form of a bound on the tail probability of the size of 

1 Lossless expanders with parameters d,k,n,N are equivalent to lossless 
conductors with parameters that are base 2 logarithms of the parameters of 
lossless expanders see [91, |5| and the references therein. 
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the set of neighbors, T(X) for a given X C U, of a randomly 
generated left-degree bipartite graph. Moreover, we provide 
deducible bounds on the tail probability of the expansion of 
the graph, |r(X)|/|X|. We derive quantitative guarantees for 
randomly generated non-mean zero sparse binary matrices to 
be adjacency matrices of expander graphs. In addition, we 
derive the first phase transitions showing regions in parameter 
space that depicting when a left-regular bipartite graph with a 
given set of parameters is guaranteed to be a lossless expander 
with high probability. The key innovation in this paper is the 
use of a novel technique of dyadic splitting of sets. We derived 
our bounds using this technique and apply them to derive £\ 
restricted isometry constants (RICi). 

Numerous compressed sensing algorithms have been de- 
signed for sparse matrices H, 0, (6), 0. Another contribu- 
tion of our work is the derivation of sampling theorems, pre- 
sented as phase transitions, comparing performance guarantees 
for some of these algorithms as well as the more traditional l\ 
minimization compressed sensing formulation. We also show 
how favorably l\ minimization performance guarantees for 
such sparse matrices compared to what £ 2 restricted isome- 
try constants (RIC2) analysis yields for the dense Gaussian 
matrices. For this comparison, we used sampling theorems 
and phase transitions from related work by Blanchard et. al. 
lfl3l that provided such theorems for dense Gaussian matrices 
based on RIC2 analysis. 

The outline of the rest of this introduction section goes as 
follows. In Section lFAl we present our main results in Theorem 
1 1.61 and Corollary 1 1.71 In Section ll-Bl we discuss RICi and its 
implication for compressed sensing, leading to two sampling 
theorems in Corollaries 11.101 and 11.111 

A. Main results 

Our main results is about a class of sparse matrices coming 
from lossless expander graphs, a class which include non-mean 
zero matrices. We start by defining the class of matrices we 
consider and a key concept of a set of neighbors used in the 
derivation of the main results of the manuscript. 

Definition 1.3: Let Abe an nx N matrix with d nonzeros 
in each column. We refer to A as a random 

1) sparse expander (SE) if every nonzero has value 1 

2) sparse signed expander (SSE) if every nonzero has value 
from {-1, 1} 

and the support set of the d nonzeros per column are drawn 
uniformly at random, with each column drawn independently. 

SE matrices are adjacency matrices of lossless (fc, d, e)- 
expander graphs while SSE matrices have random sign pat- 
terns in the nonzeros of an adjacency matrix of a lossless 
(k, d, e)-expander graph. If A is either an SE or SSE it will 
have only d nonzeros per column and since we fix d -C n, A is 
therefore "vanishingly sparse." We denote As as a submatrix 
of A composed of columns of A indexed by the set S with 
\S\ = s. To aid translation between the terminology of graph 
theory and linear algebra we define the set of neighbors in 
both notation. 

Definition 1.4: Consider a bipartite graph G(U, V, E) 
where E is the set of edges and e,-j = (Xi, yj) is the edge that 



connects vertex xi to vertex yj. For a given set of left vertices 
S C U its set of neighbors is r(5) = {yj\xi € S and e»j <E 
E}. In terms of the adjacency matrix, A, of G(U, V, E) the 
set of neighbors of As for |5| = s, denoted by A s , is the set 
of rows with at least one nonzero. 

Definition 1.5: Using Definition 11.41 the expansion of the 
graph is given by the ratio |r(5)|/|5|, or equivalently, |A s |/s. 

By the definition of a lossless expander, Definition 11.11 we 
need |r(S)| to be large for every small S C U. In terms of 
the class of matrices defined by Definition 11.31 for every As 
we want to have \A S \ as close to n as possible, where n is 
the number of rows. Henceforth, we will only use the linear 
algebra notation A s which is equivalent to r(5). Note that 
\A S \ is a random variable depending on the draw of the set of 
columns, S, for each fixed A. Therefore, we can ask what is 
the probability that \A S \ is not greater than a s , in particular 
where a s is smaller than the expected value of \A S \. This is 
the question that Theorem 11.61 to answers. We then use this 
theorem with RICi to deduce the corollaries that follow which 
are about the probabilistic construction of expander graphs, the 
matrices we consider, and sampling theorems of some selected 
compressed sensing algorithms. 

Theorem 1.6: For fixed s, n, N and d, let an n x N matrix, 
A be drawn from either of the classes of matrices defined in 
Definition 11.31 then 



Prob(|j4 s | < a s ) < p m ax(s,d) 

x exp [n ■ * (a s , . . . ,a 2 ,d)] (1) 

where p ma x(s, d) is given by 
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where ai := d. If no restriction is imposed on a s then the 
for i > 1 take on their expected value Sj given by 



»2i = CL% 



for i = 1,2,4,. 



\s/2]. (4) 



If a s is restricted to be less than a s , then the <n for i > 1 are 
the unique solutions to the following polynomial system 

al l -2a i al i +2a 2 i a 2 i-a 2 i a ii =Qfox i = 1, 2, . . . , [s/4] (5) 

with a 2 i > ai for each i. 

Corollary 1.7: For fixed s, n, N, d and < e < 1/2, let an 
nx N matrix, A be drawn from the class of matrices defined 
in Definition 11.31 then 



Prob(||A s x||i < (1 - 2e)d||aj||i) < Pmax(s,d) 

x exp [n ■ * (s, d, e)] (6) 

where \1/ (s, d,e) = \& (a s , . . . , a 2 , d) in Q with a s = (1 — 
e)ds and p m ax(s,d) is the polynomial in (|2j. 
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Theorem 11.61 and Corollary 11.71 allow us to calculate 
s, n, N, d, e where the probability of the probabilistic construc- 
tions in Definition 11.31 not being a lossless (s, d, e)-expander 
is exponentially small. For moderate values of e this allows us 
to make quantitative sampling theorems for some compressed 
sensing reconstruction algorithms. 

B. RICi and its implications to Compressed Sensing 

In compressed sensing, and by extension in sparse ap- 
proximation, we observe the effect of the application of a 
matrix to a vector of interest and we endeavor to recovery this 
vector of interest by exploiting the inherent simplicity in this 
vector. Precisely, let x € K , be the vector of interest whose 
simplicity is that it has fc < N nonzeros, which we refer to as 
A:— sparse; then we observe y € E™, as the measurement vector 
resulting from the multiplication of x by an n x TV matrix, A. 
The minimum simplicity reconstruct of x can be written as 

min ||x||o subject to Ax — y, (7) 

x£ X N 

where x N is the set of all k~ sparse vectors and ||z||o counts 
the nonzero components of z; this model may be reformulated 
to include noise in the measurements. References ifPfll . lfl5l . 
[16 1, [17] give detailed introductions to compressed sensing 
and its applications; while lfl8l |[T9l |20l. 11211. El. l5l. l6l. 
11231 . |4j, [24 1 provide information on some of the popular 
computationally efficient algorithms used to solve problem (O 
and its reformulations. 

We are able to give guarantees on the quality of the 
reconstructed vector from A and y from a variety of recon- 
struction algorithms. One of these guarantees is a bound on the 
approximation error between our recovered vector, say x, and 
the original vector by the best fc-term representation error i.e. 
||^ — < Const.\\x — Xk\\i where Xk is the optimal fc-term 
representation for x. This is possible if A has small RICi, 
in other words A satisfies the l\ restricted isometry property 
(RIP-1), introduced by Berinde et. al. in [5| and defined as 
thus. 

Definition 1.8 (RIP-1): Let \ N De the set of all k— sparse 
vectors, then annxiV matrix A has RIP-1, with the lower 
RICi being the smallest L(k,n, N; A), when the following 
condition holds. 

(l-L(A;,n,2V;A))||a;||i<||AB|| 1 <||a:|| 1 V^x™ (8) 

For computational purposes it is preferable to have A sparse, 
but little quantitative information on L(k,n,N;A) has been 
available for large sparse rectangular matrices. Berinde et. 
al. in [5] showed that scaled adjacency matrices of lossless 
expander graphs (i.e. scaled SE matrices) satisfy RIP-1, and 
the same proof extends to the signed adjacency matrices (i.e. 
so called SSE matrices). 

Theorem 1.9: If an n x N matrix A is either SE or SSE 
defined in Definition 11.31 then A/d satisfies RIP-1 with 
L(k, n, N; A) = 2e. 

Proof: The proof of the signed case (SSE) follows that of 
the unsigned case (SE) in [5 1 but with absolute values included 
in the appropriate stages. ■ 



Based on Theorem 11.91 which guarantees RIP-1, ®, for 
the class of matrices in Definition 11.31 we give a bound, in 
Corollary 11.101 for the probability that a random draw of a 
matrix with d Is or ±ls in each column fails to satisfy the 
lower bound of RIP-1 and hence fails to come from the class 
of matrices given in Definition 11.31 In addition to Theorem 
11.91 Corollary 11.101 follows from Theorem 11.61 and Corollary 

Corollary 1.10: Considering RIP-1, if A is drawn from the 
class of matrices in Definition 11.31 and any fc-sparse vector x 
with fc, n, N and < e < 1/2 fixed, then 

Probdl^Hx < (l-2e)d\\x\\ i )<p' niax (N,k,d) 

x cxp [N ■ ^net (fc, n, N; d, e)} (9) 

where p' max (N, fc, d) and * net are given by 

p' max (N, fc, d) = 1 (10) 

le^d 3 (l - 

* net (k,n,N;d,e)=H(^j+^(k,d,e), (11) 

with ^ (fc, d, e) defined in Corollary 11.71 

Furthermore, the following corollary is a consequence of 
Corollary 11.101 and it is a sampling theorem on the existence 
of lossless expander graphs. The proof of Corollaries 1 1 . 1 01 and 
11.111 are presented in Sections |l V-B2I and II V-B3 1 respectively. 

Corollary 1.11: Consider < e < 1/2 and d fixed. If A is 
drawn from the class of matrices in Definition 11.31 and any x 
drawn from x N with (fc, n, N) — > oo while k/n — > p E (0, 1) 
and n/N 8 € (0, 1) then for p < (1 - j)p exp (S; d, e) and 
7 > 

Prob(||Ac||i > (1 - 2e)d||x||i) 1 (12) 

exponentially in n, where p exp (8; d, e) is the largest limiting 
value of k/n for which 

H (£) + 77* (M ' e) = - (13) 

The outline of the rest of the manuscript is as follows: In 
Section HI] we show empirical data to validate our main results 
and also present lemmas (and their proofs) that are key to the 
proof of the main theorem, Theorem 11.61 In Section [III] we 
discuss restricted isometry constants and compressed sensing 
algorithms. In Section |IV] we prove the mains results, that 
is Theorem 11.61 and the corollaries in Sections II-AI and II-BI 
Section [V] is the appendix where we present the alternative to 
Theorem 11.61 

II. Discussion and derivation of the main results 

We present the method used to derive the main results and 
discuss the validity and implications of the method. We start 
by presenting in the next subsection, Section IH-AI numerical 
results that support the claims of the main results in Sections 
II-AI and II-BI This is followed in Section HI] with lemmas, 
propositions and corollaries and their proofs. 
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A. Discussion on main results 

Theorem 11.61 gives a bound on the probability that the 
cardinality of a union of k sets each with d elements is less 
than dfc. Figure |2] shows plots of values of (size of set of 
neighbors) for different k taken over 500 realizations (in blue), 
superimposed on these plots is the mean value of ak (in red) 
and the au in green. Similarly, Figure [3] also shows values of 
ak/k (the graph expansion) also taken over 500 realizations. 

d=S, n =1024 and k = 2,3.4, ....512 
1200 1 1 1 1 1 




0.1 0.2 0.3 0.4 0.5 

k/n 



Fig. 2. For fixed d = 8 and n = 2, over 500 realizations we plot (in blue) 
the cardinalities of the index sets of nonzeros in a given number of set sizes, 
k. The dotted red curve is mean of the simulations and the green squares are 
the aj.. 



d=S, n =1024 and k = 2,3,4, ...,512 




k/n 

Fig. 3. For fixed d = 8 and n = 2 10 , over 500 realizations we plot (in 
blue) the graph expansion for a given input set size k. The dotted red curve 
is mean of the simulations and the green squares are the a^/k. 

Theorem [L6] also claims that the a s are the expected values 
of the cardinalities of the union of s sets. We given a brief 
proof sketch of its proof in Section IH-BI in terms of the 
maximum likelihood and empirical illustrate the accuracy of 
the result in Figure|4]where we show the relative error between 
dk and the mean values of the ak, <2fc, realized over 500 runs, 
to be less than 10~ 3 . 

Figure [5] shows representative values of cij from (|5]l for 
ak '■= (1 — e)afe as a function of e for d = 8, k = 2 x 10 3 , 
and n = 2 20 . Each of the ai decrease smoothly towards d, 



d =8. n =1024 and k = 2.4.8. ...,512 
10"' F 1 1 1 




Fig. 4. For fixed d = 8 and n = 2 , over 500 realizations the relative 
error between the mean values of a^ (referred to as a^) and the a^ from 
Equation of Theorem I1.6I 

but with ai for smaller values if i varying less than for larger 
values of i. 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 



Fig. 5. Values of a;as a function of e g [0, 1) for := (1 — e)afe 
with d = 8, k = 2 X 10 3 and n = 2 20 . For this choice of d,k,n 
there are twelve levels of dyadic splits resulting in ai for i = 2 J for 
j = Q, . . . , riog 2 k~\ = 12. The highest curve corresponds to ai for i = 2 12 , 
the next highest curve corresponds to i = 2 , and continuing in decreasing 
magnitude with decreasing subscript values. 

For fixed < e < 1 /2 and for small but fixed d, p exp (S;d,e) 
in Corollary 11.111 is a function of 5 for each d and e, is 
a phase transition function in the (5, p) plane. Below the 
curve of p exp (S;d,e) the probability in (fT2l goes to one 
exponentially in n as the problem size grows. That is if A 
is drawn at random with d Is or d ± Is in each column 
and having parameters (k,n,N) that fall below the curve of 
p exp {5\ d, e) then we say it is from the class of matrices in 
Definition 11.31 with probability approaching one exponentially 
in n. In terms of \T(X)\ for X c U and \X\ < k, Corollary 
11.111 say that the probability \T(X)\ > (1 - e)dk goes to 
one exponentially in n if the parameters of our graph lies 
in the region below p exp (8; d, e). This implies that if we 
draw a random bipartite graphs that has parameters in the 
region below the curve of p exp (8; d, e) then with probability 



5 



approaching one exponentially in n that graph is a lossless 
(k, d, e)-expander. Figure [6] shows a plot of what p exp (S; d, e) 




Fig. 6. Phase transition plots of p e 
with n varied. 



3 (<5; d, e) for fixed ci = 8 and e = 1/4 



converge to for different values of n with e and c? fixed; 
Figure [7] shows a plot of what p exp (S; d, e) converge to for 
different values of d with e and n fixed; while Figure [8] shows 
plots of what p exp (S; d, e) converge to for different values of 
e with n and d fixed. It is interesting to note how increasing d 
increases the phase transition up to a point then it decreases the 
phase transition. Essentially beyond d = 16 there is no gain 
in increasing d. This vindicates the use of small d in most 
of the numerical simulations involving the class of matrices 
considered here. Note the vanishing sparsity as the problem 
size (k, n, N) grows while d is fixed to a small value of 
8. In their GPU implementation |8| Blanchard and Tanner 
observed that SSE with d = 7 has a phase transition for 
numerous sparse approximation algorithms that is consistent 
with dense Gaussian matrices, but with dramatically faster 
implementation. 



n =1024, E =0.25 



0.02 
0.018 
0.016 
0.014 
0.012 

0.01 
0.008 
0.006 
0.004 
0.002 





- d =12 
-d =16 
d -21) 



0.4 0.6 

n/N 



Fig. 7. Phase transition plots of p ex P (8;d,e) for fixed e = 1/6 and n = 2 1 
with d varied. 



As afore-stated Corollary 11.111 follows from Theorem 11.61 
alternatively Corollary 11.111 can be arrived at based on proba- 
bilistic constructions of expander graphs given by Proposition 



2. II below. This proposition and its proof can be traced back 
to Pinsker in ||251 but more recent proofs can be found in |26], 

ED- 

Proposition 2.1: For any N /2 > k > 1, e > there exists 
a lossless (k, d, e)-expander with 

d = (log (N/k) /e) and n = (k log (N/k) /e 2 ) . 



n =1024, d =8 




Fig. 8. Phase transition plots of p exp (8; d, e) for fixed d = 8 and n = 2 10 
with e varied. 

To put our results in perspective, we compare them to the 
alternative construction in ll26l which led to Corollary 12.21 
whose proof is given in Section lV-Al of the Appendix. Figure|9] 
compares the phase transitions resulting from our construction 
to that presented in [26|, but we must point out however, that 
the proof in [26 1 was not aimed for a tight bound. 

n =1024, d =8, e =0.16667 



-r>l?{S;d, t ) 



0.2 0.3 0.4 0.5 0.6 0.7 

n/N 



Fig. 9. A comparison of p exp in Theorem 11.61 to p ^f p of Corollary |2.2| 
derived using the construction based on Corollary 12.21 

Corollary 2.2: Consider a bipartite graph G(U, V, E) with 
left vertices \U\ = N, right vertices |V| = n and left degree d. 

Fix < e < 1/2 and d, as (k,n,N) —> oo while kjn^r p E 
(0, 1) and n/N -> 6 G (0, 1) then for p < (1 - ~/)p e b - p {6; d, e) 
and 7 > 



Prob (G fails to be an expander) — > 



(14) 
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exponentially in n, where p^ p (5;d,e) is the largest limiting 
value of k/n for which 



* {k,n,N;d, e) = 



(15) 



with *(fc,„,A^ e ) = H(_J+_H(e) + — log i- 



B. Key Lemmas 

The following set of lemmas, propositions and corollaries 
form the building blocks of the proof of our main results to 
be presented in Section [IV] 

For one fixed set of columns of A, denoted As, the 
probability in ([1) can be understood as the cardinality of 
the unions of nonzeros in the columns. Our analysis of this 
probability follows from a nested unions of subsets using a 
dyadic splitting technique. Given a starting set of columns 
we recursively split the number of columns from this set and 
the resulting sets into two sets of cardinality of the ceiling 
and floor of the cardinality of their union until a level when 
the cardinalities are at most two. Resulting from this type 
of splitting is a binary tree where the size of each child is 
either the ceiling or the floor of the size of it's parent set. The 
probability of interest becomes a product of the probabilities 
involving all the children from the dyadic splitting of A s . 

The computation of the probability in (Q]) involves the com- 
putation of the probability of the cardinality of the intersection 
of two sets. This probability is given by Lemma 12.31 and 
Corollary 12.41 below. 

Lemma 2.3: Let-B, B\, B 2 C [n] where \B\\ = b%, \B 2 \ = 
b 2 , B — Bi U B 2 and \B\ = b. Also let B x and B 2 be drawn 
uniformly at random, independent of each other, and define 
P„ (6, 61, 62) := Prob (1^! n B 2 \ = h + b 2 - b), then 



Pn (b,b u b 2 ) = 



-b 2 -b 



n — b\ 
b-bi 



(16) 



Proof: Given Bi,B 2 C [n] where = bi and 

\B 2 1 — b 2 are drawn uniformly at random, independent of each 
other, we calculate Prob (\Bi n B 2 \ = z) where z — bi+b 2 —b. 
Without loss of generality consider drawing B\ first, then 
the probability that the draw of B 2 intersecting B\ will have 
cardinality z, i.e. Prob {\B\ n B 2 \ — z), is the size of the event 
of drawing B 2 intersecting B\ by z divided by the size of 
the sample space of drawing B 2 from [a], which are given 
by (^) • _*) an d respectively. Rewriting the division 
as a product with the divisor raised to a negative power and 
replacing z by 61 + b 2 — b gives (TToT l. ■ 
Corollary 2.4: If two sets, Bi,B 2 C [n] are drawn uni- 
formly at random, independent of each other, and B = B\\JB 2 



Prob (IB] =6)=P n (6,6i,6 2 )x 

Prob (iBil =&!)• Prob (|B 2 | 



(17) 



Proof: Prob (\B\ = b) = Prob (\B 1 U B 2 \ = b) by defini- 
tion. As a consequence of the inclusion-exclusion principle 

Prob [\Bx \JB 2 \=b)= Prob (\B X n B 2 \ = bi + b 2 - b) 

x Prob = 61) -Prob (\B 2 \ = b 2 ). (18) 



We use Lemma [231 to replace Prob (\Bi n B 2 \ = b\ + b 2 — b) 
in dT8l by P„ (b, bi,b 2 ) leading to the required result. ■ 

In the binary tree resulting from our dyadic splitting scheme 
the number of columns in the two children of a parent 
node is the ceiling and the floor of half of the number of 
columns of the parent node. At each level of the split the 
number of columns of the children of that level differ by 
one. The enumeration of these two quantities at each level 
of the splitting process is necessary in the computation of the 
probability of ([1]). We state and prove what we refer to a dyadic 
splitting lemma, Lemma |2~31 which we later use to enumerate 
these two quantities - the sizes (number of columns) of the 
children and the number of children with a given size at each 
level of the split. 

Lemma 2.5: Let S be an index set of cardinality s. For any 
level j of the dyadic splitting, j = 0, . . . , [log 2 s] — 1, the 
set S is decomposed into disjoint sets each having cardinality 
Qj = \~~\ or Rj = Qj — 1. Let qj sets have cardinality Qj 
and rj sets have cardinality Rj, then 



23 , and 



2-' 



Qj- 



(19) 



Proof: At every node on the binary tree the children have 
either of two sizes (number of columns) of the floor and ceiling 
of half the sizes of there parents and these sizes differ at most 
by 1, that is at level j of the splitting we have at most 2 
different sizes. We define these sizes, Qj and Rj, in terms of 
two arbitrary integers, mj and m 2 , as follows. 

(20) 

Because of the nature of our splitting scheme we have Rj = 
Qj — 1 which implies that mi and m 2 must satisfy the relation 



O -- + — 



and 



mi — m 2 
21 



1. 



(21) 



Now let qj and rj be the number of children with Qj and Rj 
number of columns respectively. Therefore, 



qj+rj =2>. 



(22) 



At each level j of the splitting the following condition must 
be satisfied 

Qj ' Qj + r j ' ^-j — s - (23) 

To find mi, m 2 , qj and rj, from (|20b we substitute for Qj 
and Rj in d23l to have 



Qj 



s mi 
2~j + ~23~ 



m 2 \ 
2? ) 


= s, 


(24) 


3 rjm 2 


= s, 


(25) 


r 3 m 2 ) 


= s, 


(26) 


rjm 2 ) 


= s, 


(27) 


- rjm 2 


= 0. 


(28) 



We expanded the brackets from d24"l i to ( f25l ) and simplified 
from d25l l to d26l i. We simplify the first term of (f26T > using 
d22"l) to get d27l i and we simplified this to get 
Equation (f2Tb yields 



mi = m 2 + 2 J . 



(29) 
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Substituting this in ((28) yields 

qj (m 2 + 2 j ) + rjm 2 = 0, 
(qj + rj) m 2 + 2- 7 g J - = 0, 
V{q 3 +m 2 ) =0. 



(30) 
(31) 
(32) 



From (l30t to OTt we expanded the brackets and rearranged 
the terms and used (l22l to simplify to (l32l . Using (l32l and 
(f29) respectively we have 



772 2 = — qj and m\ — 2° — qj = rj. 
Substituting this in d20l > we have 

Equating this value of Qj to its defined value in the statement 
of the lemma gives 



(33) 



(34) 



1 = 



Therefore, from (l33l we use 

rj = 2 3 - q 3 



q,j = s 
to have 

= 2 3 



2K (35) 



s. 



(36) 



which concludes the proof. ■ 
The bound in (Q]) is derived using a large deviation analysis 
of the nested probabilities which follow from the dyadic split- 
ting in Corollary 12.41 The large deviation analysis of (TTBT l at 
each stage involves its large deviation exponent ip n { m ), which 
follows from Stirling's inequality bounds on the combinatorial 
product of ( TToT l. Lemma 12.61 establishes a few properties of 
tpn(') while Lemma |2~71 shows how the various i/j n (-)'s at a 
given dyadic splitting level can be combined into a relatively 
simple expression. 
Lemma 2.6: Define 



ip n (x,y,z):=yU[- — -)+ (n — y) - H 

1 ' n — y 



■H(£), (37) 



then for 77 > x > y we have that 



for y > z ip n (x,y, y) < ^ n {x, y,z) < ip„(x, z, z); (38) 
for x > z i/) n (x,y,y) > i/) n (z,y,y); 



for 1/2 < a < 1 tp n (x,y, y) < ■>p n (ax, ay, 
Proof: We start with Property d38l and first 



ay). 



(39) 
(40) 



Proof: We start with Property ( l38l l and first show that the 
left inequality holds. If we substitute y for z in (l37l i with y > z 
we reduce the first and last terms of (l37l i while we increase the 
middle term of (f37T > which makes ip n (x,y,y) < ip n (x,y, z). 
For second inequality we replace y by z in ( f37T > with y > z 
we increase the first and the last terms of (|37| | and reduce 
the middle term which makes ip n (x, y, z) < ip n (x, z, z). This 
concludes the proof for (13 St . 

Property d39l states that for fixed y, ij} n (x,y,y) is mono- 
tonically increasing in its first argument. To prove ( |39l we 
use the condition 77 > x > y to ensure that H(p) increases 
monotonically with p, which implies that the first and last 
terms of (l37T i increase with x for fixed y while the second 
term remains constant. 



Property (l40l means that ip n (x,y,y) is monotonically de- 
creasing in 2; and y. For the proof we show that for 1/2 < a < 
1 the difference tp n (ax, ay, ay) — ip n (x, y, y) > 0. Using (|37| > 
we write out clearly what the difference, i[; n (ax,ay,ay) — 
ip n (x,y,y), is as follows. 



ayH 
-yU 



ax — ay 
ay 
x-y 
V 



(n-ay)H[^^)-nu(^ 



n — ay 



V 77 / 



n-y 



-i"-"' 11 !?— r) • ""(jf) (4i) 



\ V J \ n-ay J 



n 

ax — ay 
n — ay 



\ 77 



n-y 



yR ( 



(42) 



ayH ^ - ayH[ y - )-yll' 



77 — ay 



n-y 



„„ ( ^) + „H(3-„„(f 



77 — ay 



77/ \ 77 

x - y x 



77 - y 



(43) 



From fiTt to d42b we expanded brackets and simplified, while 
from d42l to (l43l we rearranged the terms for easy comparison. 

Again 77 > x > y ensures that the arguments of H(-) are 
strictly less than half and H(p) increases monotonically with 
p. In (l43l the difference of the first two terms in the first row 
is positive while the difference of the second two terms is 
negative. However, the whole sum of the first four terms is 
negative but very close to zero when a is close to one which 
is the regime that we will be considering. The difference of 
the last two terms in the second row is positive while the 
difference of the terms on bottom row is negative but due to 
the concavity and steepness of the Shannon entropy function 
the first positive difference is larger hence the sum of last four 
terms is positive. Since we can write ?7 = cy with c > 1 being 
an arbitrarily constant, then the positive sum in the second 
four terms dominates the negative sum in the first four terms. 
This gives the required results and hence concludes this proof 
and the proof of Lemma 12.61 ■ 

Lemma 2.7: Given ip n (') as defined in d37l i then the fol- 
lowing bound holds. 

rio g2 ( S )i-2 r 

E 

J=0 



Tj ■ i>n \ Ojjj^^-j iO^J 



riog 2 (s)l-l 



< 



^2 23 ■ i>n [a Qj ,a^ya^^j , (44) 



wher e Q « r i°g 2 wi-i = d - 



Proof: The quantity inside the left hand side summation 



in (O, i.e. 



1j ■ i>n [ a Qj ,a^ya^^ 



(45) 



is equal to the following if we replace qj and Tj by their values 
given in Lemma [ 



(s - 2 3 
< 2 :i 



(46) 



< 



< 



< 



s - V 
2 J 



s-2 J 
2 : > 



s - 2? 

23 



- ■ ^ n ^a Bj , a ^ j , a ^ j ^ . 

- • r/'n ^a Bj , a ^ j , a ^ j ^ . 
-s^j -ip n (a Q] , a ^ ^ j , a ^ j 

From (l46l l to (l47l i we upper bounded 

V'n faQj , a j-oq , a ^ j ^ by Vn f a Qj . , a j^j , a 



(47) 



(48) 



(49) 



(50) 



and Vi n ^a flj ,q|-Rj-|,a|flj j j by -0n ( or, , a ^ j , a ^ Rj_ j 
using ( f38b of Lemma 12.61 We then upper bounded 

i> n ya-Qj , a | Qj_ j , Q ^ by ^ n ^a Qj , a | I ,Q| | ^ , 

from dUl) to (@8]l, again using (138b of Lemma 12.61 From 
(|48T > to ||49), using <[39j of Lemma |X6] we bounded 

$n y a Ri> a [?±^ '"L-^-j) by ^ n ("Qi' a [^iJ' a [iJ Y 

For the final step from d49l ) to (l50l we factored out 
^ a Qj = a |^fiJ^ anc ^ tnen simplified. 

Using g r i og2 ( s) ]_i + r-|- log2(s)1 _i = 2^2^)1-1 we bound 
gpog^Wl-i b y 2r io S2( s )l — !. Then we add this to the summa- 
tion of d49l for j = 0, . . . , |~log 2 (s)] —2 establishing the bound 
of Lemma 12.71 ■ 

Now we state and prove a lemma about the quantities ai. 
During the proof we will make a statement about the ctj 
using their expected values di which follows from a maximum 
likelihood analogy. 

Lemma 2.8: The problem 



max V" — • (o2i, ai, o») 

7,„ fl.n ' * 



(51) 



expected values of the a i: given by 



a 2! 



2- 



for » = 1,2,4,..., fs/21, (52) 



which are a solution of the following polynomial system. 



a \s/2\ 



— 2na 



r*/2i 



+ M s = 0, 



2aia 2 , + 2a, a 2i 



2 

- an 
for i 



0, 

1,2, 



...I"*/*"!. ( 53 ) 



where ai = d. If a s is constrained to be less than a s , then 
there is a different global maximum, instead the a, satisfy the 
following system 



— 2a i a3 >i 



2a, a 2l 



a} an 



0, 



for i = 1,2,4,..., [s/41, (54) 



again with ai = d. 
Proof: Define 

\& n (a 5 ,...,a 2 ,d) 



i=l 



(55) 



Using the definition of V'n(') m <E3 we therefore have 

r«/2i 



*n(a s ,. 



■ ,a 2 ,d) 
(n - Oj) • H 



^ 2i 

i=l 



«2, 



a,; • H 



a 2 i - a t 



a, 



+ 



(56) 



The gradient of (a s , . . . , a 2 , d) , V*„ (a s 
given by 



. . , a 2 , d) is 



log 



(2a[- s / 2 ] - a s ) (n - a s ) 



2/ 



log 



-a rs/2 ] y 
a2i {an 



a2i) (2a 4 - a 2 i) 



(2a 2 i - an) (a 2 i - 

for i = 1,2,4,.. 



has a global maximum and the maximum occurs at the 



fs/41, (57) 

where u T is the transpose of the vector v. Obtaining the 
critical points by solving (a s , . . . , a 2 ,d) — leads to 

the polynomial system ( T53b . 

The Hessian, V 2, 5 n (a s , . . . , a 2 , d) at these optimal ai 
which are the solutions to the polynomial system d53l ) is 
negative definite which implies that this unique critical point 
is a global maximum point. Let the solution of the system 
be the cii then they satisfy a recurrence formula d52l which 
is equivalent to their expected values as explained in the 
paragraph that follows. 

We estimate the uniformly distributed parameter relating a 2 . L 
to ai. The best estimator of this parameter is the maximum 
likelihood estimator which we calculate from the maximum 
log-likelihood estimator (MLE). The summation of the ip n {-) 
is the logarithm of the join density functions for the 02;. 
The MLE is obtained by maximizing this summation and 
it corresponds to the expected log-likelihood. Therefore, the 
parameters given implicitly by d52l ) are the expected log- 
likelihood which implies that the values of the d 3 in d52| ) are 
the expected values of the a^ 
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If we restrict a s to take a fixed value, then 
V$„ (a s ,...,a 2 ,d) is given by 




an (a4» - «2») (2a t - a 2i ) 
(2a 2l - a 4l ) (a 2i - ai) 

for i = 1,2,4,.. 



fs/41 . (58) 



Obtaining the critical points by solving V^ n (a s , . . . ,a 2 ,d) — 
leads to the polynomial system d54i i. 

Given a s , the Hessian, V 2 * n (a s , . . . , a 2 , d) at these op- 
timal cti which are the solutions to the polynomial system 
(1541 is negative definite which implies that this unique critical 
point is a global maximum; this case differs from a maximum 
likelihood estimation because of the extra constraint of fixing 
a s . m 

The dyadic splitting technique we employ requires greater 
care of the polynomial term in the large deviation bound of 
P„ (x, y, z) in (f 16b : Lemma 12.101 establishes the polynomial 
term. 

Definition 2.9: P„ (x, y, z) defined in ( TToT ) satisfies the up- 
per bound 



P„ (x, y, z) < 7T (x, y, z) exp(i/>„(x, y, z)) 



(59) 



with bounds of ir (x, y, z) given in Lemma [2. 101 

Lemma 2.10: For tt (x, y, z) and P„ (x, y, z) given by 
and ( |T6T > respectively, if {y, z} < x < y + z, tt (x, y, z) is 
given by 



yz(n-y)(n - z) 



2im(y + z — x)(x — y)(x — z)(n — x) 
otherwise w(x,y, z) has the following cases. 



y(n- z) 
n(y - z) 



(n - y)(n - z) 
n(n — y — z) 

r 2irz(n — z) 



if x — y > z; 
if x = y + z; 
if x = y = z. 



(60) 

(61) 
(62) 
(63) 



Proof: The Stirling's inequality below would be used in 
this proof and other proofs to follow. 

|(2.p(l-^)-e^)<(^) 

< (2np(l - p)N)~* e Nn( P\ (64) 

where H(p) = — plog(p) — (1 — p) log(l — p) is the Shannon 
entropy function for base e logarithms. 

From Definition 12.91 the quantity %(x,y,z) is the polyno- 
mial portion of the large deviation upper bound. Within this 
proof we express this by 



n(x,y,z) =poly 



y 

y + z — x 



n-y 
x-y 



(65) 



We derive the upper bound ir(x,y,z) using the Stirling's 
inequality. The right inequality of ( f64b is used to upper bound 



( y+ v z - x ) an d and the left inequality of d64b is used to 

lower bound ("). If {y, z} < x < y + z the bound is well 
defined and simplifies to d60l . 

If x — y > z ( f60b is undefined; however, substituting y for 
x in 465) gives ( y+ l_ x ) = («) and (^) = (V) = 1- We 
upper bound the product P) (™) using the right inequality 
in d64t to bound (^J from above and the left inequality in 
d64l ) to bound from below ( n j . The resulting polynomial part 
of the product simplifies to doTb . 

Jtx = y + z, then ( y+ ^_J = (g) = 1 and = 
( n z v ). As above, we upper bound the product of and 
(™) using d64b and simplify the polynomial part of this 
product to get J62l . If instead x = y = z, then ( y+ y z _ x ) = (q) 
and ("Z 1 ') = (™o 2/ ) both of which equal 1. Therefore the 
bound only involves (™) which we bound using d64b and 
the resulting polynomial part simplifies to d63l . ■ 

Corollary 2.11: If n > 2y, then ir(y, y, y) is monotonically 
increasing in y. 

Proof: If n > 2y, (l63l implies that ir(y,y,y) is propor- 
tional to y/y, i.e. 7r(y, y, y) = Cy^, with c > and Cy 7 ?/ is 
mono tonic in y. ■ 

III. Restricted isometry constants and 
Compressed Sensing algorithms 

Here we introduce RIC2 and briefly discuss the implications 
of RICi and RIC2 to compressed sensing algorithms in Sec- 
tion llll-Al In Section lTlI-Bl we present the first ever quantitative 
comparison of the performance guarantees of some of the 
compressed sensing algorithms proposed for sparse matrices 
as stated in Definition 11.31 



A. Restricted isometry constants 

It is possible to include noise in the Compressed Sensing 
model, for instance y = Ax + e where e is a noise vector 
capturing the model misfit or the non-sparsity of the signal x. 
The fo- mm i m i za ti° n problem (0 in the noise case setting is 



min \\x subject to 

x£ X N 



\Ax-y\\ 2 < 



e l|2, 



(66) 



where ||e||2 is the magnitude of the noise. 

Problems © and d66i > are in general NP-hard and hence 
intractable. To benefit from the rich literature of algorithms 
available in both convex and non-convex optimization the 
£q -minimization problem is relaxed to an ^-minimization 
one for < p < 1. It is well known that the £ p norm 
for < p < 1 are sparsifying norms, see [19], ll2D . In 
addition, there are specifically designed classes of algorithms 
that take on the £q problem and they have been referred to 
as greedy algorithms. When using dense sensing matrices, A, 
popular greedy algorithms include Normalized Iterative Hard 
Thresholding (NIHT), 11271 . Compressive Sampling Matching 
Pursuits (CoSAMP), J22l, and Subspace Pursuit (SP), l20l . 
When A is sparse and non-mean zero, a different set of 
combinatorial greedy algorithms have been proposed which 
iteratively locates and eliminate large (in magnitude) com- 
ponents of the vector, [5 |. They include Expander Matching 
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Pursuit (EMP), (28), Sparse Matching Pursuit (SMP), \29\, 
Sequential Sparse Matching Pursuit (SSMP), J30|, Left Degree 
Dependent Signal Recovery (LDDSR), 11241 . and Expander 
Recovery (ER), |23l. f71. 

The convergence analysis of nearly all of these algorithms 
rely heavily on restricted isometry constants (RIC). As we saw 
earlier RICs measures how near isometry A is when applied to 
fc-sparse vectors in some norm. For the l\ norm, also known 
as the Manhattan norm, RICi is stated in (0. The restricted 
Euclidian norm isometry, introduced by Candes in I13T1 , is 
denoted by RIC2 and is defined in Definition 13.11 

Definition 3.1 (RIC2,): Define \ N to be the set of all 
fc— sparse vectors and draw annxiV matrix A, then for all x £ 
X N , A has RIC 2 , with lower and upper RIC 2 , L(k, n, N; A) 
and U(k,n, N; A) respectively, when the following holds. 

{l-L{k,n,N;A))\\x\\ 2 < \\Ax\\ 2 

< (l + U(k,n,N;A))\\x\\ 2 . 

The computation of RICi for adjacency matrices of lossless 
(k, d, e)-expander graphs is equivalent to calculating e. The 
computation of RIC2 is intractable except for trivially small 
problem sizes (k, n, N) because it involves doing a combinato- 



rial search over all 



column submatrices of A. As a results 



attempts have been made to derive RIC2 bounds. Some of 
these attempts have been successful in deriving RIC2 bounds 
for the Gaussian ensemble and these bounds have evolved from 
the first by Candes and Tao in |[T9l . improved by Blanchard, 
Cartis and Tanner in ll32l and further improved by Bah and 
Tanner in l33l . 

RIC2 bounds have been used to derive sampling theorems 
for compressed sensing algorithms - £\ -minimization and the 
greedy algorithms for dense matrices, NIHT, CoSAMP, and 
SP. Using the phase transition framework with RIC2 bounds 
Blanchard et. al. compared performance of these algorithms 
in IPT31 . In a similar vain, as another key contribution of this 
paper we provide sampling theorems for l\ -minimization and 
combinatorial greedy algorithms, EMP, SMP, SSMP, LDDSR 
and ER, proposed for SE and SSE matrices. 



B. Algorithms and their performance guarantees 

Theoretical guarantees have been given for t\ recovery 
and other greedy algorithms including EMP, SMP, SSMP, 
LDDSR and ER designed to do compressed sensing recovery 
with adjacency matrices of lossless expander graphs and by 
extension SSE matrices. Sparse matrices have been observed 
to have recovery properties comparable to dense matrices 
for ^-minimization and some of the aforesaid algorithms, 
see 0, ED, ED, 0, (24) and the references therein. Base 
on theoretical guarantees, we derived sampling theorems and 
present here phase transition curves which are plots of phase 
transition functions p alg (S; d, e) of algorithms such that for 
k/n^f p < (1 - ~f)p al9 (6; d, e), 7 > 0, a given algorithm is 
guaranteed to recovery all fc-sparse signals with overwhelming 
probability approaching one exponentially in n. 



1) i\-minimization: Note that l\ -minimization is not an 
algorithm per se, but can be solved using Linear Program- 
ming (LP) algorithms. Berinde et. al. showed in [5| that l\- 
minimization can be used to perform signal recovery with bi- 
nary matrices coming from expander graphs. We reproduce the 
formal statement of this guarantee in the following theorem, 
the proof of which can be found in (5), iRfl . 

Theorem 3.2 (Theorem 3, £5;/, Theorem 1, fiflj): Let A be 
an adjacency matrix of a lossless (k, d, e)-expander graph with 
a(e) = 2e/(l — 2e) < 1/2. Given any two vectors x, x such 
that Ax = Ax, and ||i||i < ||x||i, let Xk be the largest (in 
magnitude) coefficients of x, then 

2 

\\x-x\\i<- — —\\x-x k x- (67) 

1 — 2a{e) 

The condition that a(e) = 2e/(l — 2e) < 1/2 implies the 
sampling theorem stated as Corollary 13.31 that when satisfied 
ensures a positive upper bound in d67l >. The resulting sampling 
theorem is given by p £l (S; d, e) using e = 1/6 from Corollary 

E2 

Corollary 3.3 (SSJH): £1 -minimization is guaranteed to re- 
cover any fc-sparse vector from its linear measurement by an 
adjacency matrix of a lossless (fc, d, e)-expander graph with 
e < 1/6. 

Proof: Setting the denominator of the fraction in the right 
hand side of (|67| | to be greater than zero gives the required 
results. ■ 

2) Sequential Sparse Matching Pursuit (SSMP): Introduced 
by Indyk and Ruzic in J30|, SSMP has evolved as an im- 
provement of Sparse Matching Pursuit (SMP) which was an 
improvement on Expander Matching Pursuit (EMP). EMP also 
introduced by Indyk and Ruzic in [28 1 uses a voting-like mech- 
anism to identify and eliminate large (in magnitude) compo- 
nents of signal. EMP's drawback is that the empirical number 
of measurements it requires to achieve correct recovery is 
suboptimal. SMP, introduced by Berinde, Indyk and Ruzic in 
1 29], improved on the drawback of EMP. However, it's original 
version had convergence problems when the input parameters 
(fc and n) fall outside the theoretically guaranteed region. 
This is fixed by the SMP package which forces convergence 
when the user provides an additional convergence parameter. 
In order to correct the aforementioned problems of EMP and 
SMP, Indyk and Ruzic developed SSMP. It is a version of 
SMP where updates are done sequentially instead of parallel, 
consequently convergence is automatically achieved. All three 
algorithms have the same theoretical recovery guarantees, 
which we state in Theorem l3.41 but SSMP has better empirical 
performances compared to it's predecessors. 

Algorithm 1 below is a pseudo-code of the SSMP algorithm 
based on the following problem setting. The measurement 
matrix A is an n x N adjacency matrix of a lossless ((c + 
l)fc, d, e/2) -expander scaled by d and A has a lower RICi, 
L ((c + l)fc, n, N) — e. The measurement vector y = Ax + e 
where e is a noise vector and 77 = ||e||i. We denote by Hk(y) 
the hard thresholding operator which sets to zero all but the 
largest, in magnitude, fc entries of y. 

The recovery guarantees for SSMP (also for EMP and 
SMP) are formalized by the following theorem from which 
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Algorithm 1 Sequential Sparse Matching Pursuit (SSMP) 1 30 f 
Input: A, y, r\ 

Output: fc-sparse approximation x of the target signal x 
Initialization: 

1. Set j = 

2. Set Xj = 

Iteration: Repeat T = (log (||a:||i/?j)) times 

1. Set j = j + 1 

2. Repeat (c — l)fc times 

a) Find a coordinate i & an increment z that 
minimizes + zei) — y\\\ 

b) Set Xj to + zei 

3. Set = H k (xj) 
Return x, = x T 



Algorithm 2 Expander Recovery (ER) |23|, |7| 
Input: A. y 

Output: fc-sparse approximation x of the original signal x 
Initialization: 

1. Set x = 
Iteration: Repeat at most 2fc times 

1. if y = Ax then 

2. return x and exit 

3. else 

4. Find a variable node ij such that at least (1 — 2e)d of the 
measurements it participated in, have identical gap g 

5. Set Xj = Xj + g, and go to 2. 

6. end if 



we deduce the recovery condition (sampling theorem) in terms 
of e in Corollary 13.51 Based on Corollary 13.51 deduced from 
Theorem 13.41 we derived phase transition, p SSMP (S; d, e), for 
SSMP. 

Theorem 3.4 (Theorem 10, H2E\l ): Let A be an adjacency 
matrix of a lossless (fc, d, e)-expander graph with e < 1/16. 
Given a vector y = Ax + e, the algorithm returns approxima- 
tion vector x satisfying 



i||i < 



l-4e 
1 — 16e 1 



Xk 1 + 



(i 



(1 - 16e)d 



e i, 



(68) 



where Xk is the fc largest (in magnitude) coordinates of x. 

Corollary 3.5 ([28]): SSMP, EMP, and SMP are all guar- 
anteed to recover any fc-sparse vector from its linear measure- 
ment by an adjacency matrix of a lossless (k, d, e)-expander 
graph with e < 1/16. 

3) Expander Recovery (ER): Introduced by Jafarpour et. 
al. in [23 1, [7], ER is an improvement on an earlier algorithm 
introduced by Xu and Hassibi in [24] known as Left Degree 
Dependent Signal Recovery (LDDSR). The improvement was 
mainly on the number of iterations used by the algorithms 
and the type of expanders used, from (fc, d, 1/4) -expanders 
for LDDSR to (fc, d, e)-expander for any e < 1/4 for ER. 
Both algorithms use this concept of a gap defined below. 

Definition 3.6 (gap, j |2?l/ . &§): Let x be the original 
signal and y = Ax. Furthermore, let x be our estimate for x. 
For each value yi we define a gap gi as: 



N 
3=1 



(69) 



Algorithm 2 below is a pseudo-code of the ER algorithm 
for an original fc-sparse signal x £ R N and the measurements 
y = Ax with an n x N measurement matrix A that is an 
adjacency matrix of a lossless (2fc, d, e)-expander and e < 1/4. 
The measurements are assumed to be without noise, so we aim 
for exact recovery. The authors of 11231 . Q have a modified 
version of the algorithm for when x is almost fc-sparse. 

Theorem 13.71 gives recovery guarantees for ER. Directly 
from this theorem we read-off the recovery condition in terms 
of e for Corollary 13.81 from which we derive phase transition 
functions, p ER (5; d, e), for ER. 

Theorem 3.7 (Theorem 6, $Z§): Let A € R nxJV be the 
adjacency matrix of a lossless (2fc, d, e)-expander graph, where 
e < 1/4 and n — (fc log(iV/fc)). Then, for any fc-sparse 



signal x, given y 
most 2fc iterations. 



Ax, ER recovers x successfully in at 



Corollary 3.8: ER is guaranteed to recover any fc-sparse 
vector from its linear measurement by an adjacency matrix of 
a lossless (fc, d, e)-expander graph with e < 1/4. 




p>'(S; d, e = 1/6) 
p ER (S; d, £ = 1/4) 



0.5 

n/N 



Fig. 10. Phase transition curves p alg (5; d, e) computed over finite values 
of <5 £ (0, 1) with d fixed and the different t values for each algorithm - 1/4, 
1/6 and 1/16 for ER, £i and SSMP respectively. 



4) Comparisons of phase transitions of algorithms: Figure 
[10] compares the phase transition plot of p (5;d,e) for 
SSMP (also for EMP and SMP), the phase transition of plot 
p ER (S; d, e) for ER (also of LDDSR) and the phase transition 
plot of p ei (S;d, e) for £i-minimization. Remarkably, for ER 
and LDDSR recovery is guaranteed for a larger portion of 
the (S,p) plane than is guaranteed by the theory for l\- 
minimization using sparse matrices; however, l\ -minimization 
has a larger recovery region than does SSMP, EMP, and SMP. 

Figure [TTJ shows a comparison of the phase transition of 
£i-minimization as presented by Blanchard et. al. in JT3 | for 
dense Gaussian matrices based on RIC2 analysis and the 
phase transition we derived here for the sparse binary matrices 
coming from lossless expander based on RICi analysis. This 
shows a remarkable difference between the two with sparse 
matrices having better performance guarantees; this improve- 
ment is achieved through RICi being more closely related to 
l\ -minimization than is RIC2. 
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Fig. 11. Phase transition plots of t\, pQ (8), for Gaussian matrices derived 
using RIC2 and (5; d, e) for adjacency matrices of expander graphs with 
n = 1024, d = 8, and e = 1/6. 

IV. Proof of mains results 
A. Proof of Theorem 17.61 



By the dyadic splitting \A S 



A r#i u ^?#j 



Prob (\A a 



< a„) = Prob 
Prob 



= E E 
= EEE P *M 



= L 



and therefore 

o, ) (70) 
(71) 



Prob 



(KiiHrn 



1 2 

-1 ' 4 Lf J 



Prob i 



A 2 



- 7 2 
-*L«J 



(72) 



From (l70l to (TTTb we sum over all possible events while 
from ( TTTb to (l72l . in line with the splitting technique, we 
simplify the probability to the product of the probabilities of 
the cardinalities of A\ sl and 



A 2 



and their intersection. 



In a slight abuse of notation we write J2vj=i x t0 denote 
applying the sum x times. Now we use Lemma |Z51 to simplify 
( 1721 as follows. 



E E E 



Qo Qi 
j 1 = l,...,q j 2 =l,...,gi J3=l 

9i 



7J1 ;2j'i-l ;2ji 



n prob (l^l=^) 



A Ri 



(73) 



x Yl Prob ( 

33=91+1 

Let's quickly verify that d73l is the same as d72l . By Lemma 
Qo = s is the number of columns in the set at the zeroth 
level of the split while q = 1 is the number of sets with Q 
columns at the zeroth level of the split. Thus for ji = 1 the 
first summation and the P„(-) term are the same in the two 
equations. If [%■] = L^J> then they are both equal to Qi and 



qi = 2 while r x = 0. If on the other hand \$f\ = [^J + 1, 
then 51 = 1 and r\ = 1. In either case we have the remaining 
part of the expression of d72l i.e. the second two summations 
and the product of the two Prob(-). 

Now we proceed with the splitting - note ( l73l stopped only 
at the first level. At the next level, the second, we will have 
<72 sets with Q2 columns and r-i sets with R2 columns which 
leads to the following expression. 



E E E p ™ ( l Qo< l ^y l2 \jL) ) 

,31 ,32 ,33 * ' 

l Qo Ql Rl 

ji=l,...,qo j 2 =l,...,8i js=l,...,ri 

E E p » ( z q 2 i ' 1 rli] 1 ' l2 yL j ) 



Q2 



,35 
l H 2 



J4=l,...,g2 j5 = l,---^2 



p» fe' i r j i" 1 i .?i J ) x n prob 

92 +r 2 

J| Prob ( 



J5=92 + l 



- I 35 

l Ri 



(74) 



We continue this splitting of each instance of Prob(-) for 
[log 2 s"| — 1 levels until reaching sets with single columns 
where, by construction, the probability that the single column 
has d nonzeros is one. This process gives a complicated 
product of nested sums of P„(-) which we express as 



E E E M^o'^r^j) 

,31 ,32 .33 V 

Qo Ql Hi 

jl=l,...,q j2 = l,-..,9l J3=l, — iH. 



/34 /35 
'Q 2 «2 
l'4 = l,..-,92 35 = li".,r2 



E 



; 32riog 2 «1 -2 

Q rio E2 =i~i 

j2rio g2S l-2 = 1 .---,9j riog2al _ 1 



p f,32[la g2 sl-4 i 2 j2riog 2 »l-4-t 7 2 j2 riog 2 si -4 \ 

p (jh[io e2 =1-3 ,2j2riog 2 =l-3- 1 7 



(' 



P n [l^^\d,d) 



(75) 



Using the definition of P„(-) in Lemma 12.31 we bound 
( fTBT ) by bounding each P„( ) as in d59b with a product of 
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a polynomial, tt(-), and an exponential with exponent ip n (-). 



E E E ^ ( ^Qd>^«jii '^Saj 



;J1 



ji=l,...,go J2=l, J3=l,— in 



Qo' r «p.-| ' L Qu.j 



I lh ,2j 2 -l fin 
f lis i2is-l ;2j 3 



E E 

j4=l,---,92 i 5 = l,---,'"2 



J2 i 2 :^- 1 j 2 J2 



' Rl r^i l^j 



E 



/iiariog2 si-4 7 2 i2rio E2 «i -4-1 ,2j2rio E2 »i-4\ 



x e 



32riog 2S l-4 2 J2riog 2a l-4- 1 2 32riog 2 =1-4 
4 ' l 2 '2 

^^2riog 2sl -3 ;Z 2 j2riog23l _ 3 -l ; ^ 

^'ariog 2 .1-3 ; 2 J2riog 2 si-3- 1 



x e 



^2riog 2a i- 2 



(76) 



Using Lemma [2~6l we maximize the ip n (-) and hence the 
exponentials. If we maximize each by choosing L.\ to be a/.\, 
then we can pull the exponentials out of the product. The 
exponential will then have the exponent \&„ (a s , . . . , 0,2, d). 
The factor involving the tt(-) will be called II (Z s , ■ ■ ■ , h, d) 
and we have the following upper bound for d76l i. 

n (l„ . . . , l 2 , d) • exp [*„ (a s , . . . , 02, d)} , (77) 

where the exponent [a s , ■ ■ ■ , 0,2, d) is given by 



i'n (aQ a ,a^QsL^ , + ■ • • + V>n (a-2,d,d) . 



(78) 



Now we attempt to bound the probability of interest in 
d70| >. This task reduces to bounding II (l s , . . . , I2, d) and 
\& n (a s , . . . ,0,2, d) in j77l ) and we start with the former, i.e. 
bounding II (l s , . . . , I2, d). We bound each sum of tt(-) in 
II (l s , . . . , I2, d) of dTTb by the maximum of summations 
multiplied by the number of terms in the sum. From d63l > we 
see that tt(-) is maximized when all the three arguments are 
the same and using Corollary 12.1 11 we take largest possible 
arguments that are equal in the range of the summation. In 
this way the following proposition provides the bound we end 
up. 

Proposition 4.1: Let's make each summation over the sets 
with the same number of columns to have the same range 
where the range we take are the maximum possible for each 
such set. Let's also maximize tt(-) where all its three input 



variables are equal and are equal to the maximum of the 
third variable. Then we bound each sum by the largest term 
in the sum multiplied by the number of terms. This scheme 
combined with Lemma [231 give the following upper bound on 

U(l s ,...,l 2 ,d). 




9riog 2 si-i 



(79) 



Proof: From d63l ) we have 



ir(y,y,y) 



Simply put, we bound ^2 X tt(x, y, z) by multiplying the max- 
imum of ir(x,y,z) with the number of terms in the summa- 
tion. Remember the order of magnitude of the arguments of 
7r(x, y, z) is x > y > z. Therefore, the maximum of "k(x, y, z) 
occurs when the arguments are all equal to the maximum value 
of z. In our splitting scheme the maximum possible value of 
is I TjjM • d since there are d nonzeros in each column. 



Also 1 1 o^j <l Q . 



< I 



l r Qj -J so the number of terms 



in the summation over lg j is [^-1 • d, and similarly for Rj. 
We know the values of the Qj and the Rj and their quantities 
qj and Tj respectively from Lemma |2~31 

We replace y by [^-J • d or [-y-J • d accordingly into the 
bound of ?r(y, y, y) in (f80b and multiply by the number of 
terms in the summation, i.e. ■ d or \^f~\ ■ d. This product 
is then repeated qj or rj times accordingly until the last level 
of the split, j = [log 2 s] — 1, where we have on og2 s ]-i an d 
Qriog 2 s]-i (which is equal to 2). We exclude J?n og s n_i since 
ifin og = d. Putting the whole product together results to 



791 ) hence concluding the proof of Proposition 142 
As a final step we need the following corollary. 
Corollary 4.2: 



Tl{l s ,...,l 2 ,d) < 



25V27rs 3 d 3 



exp[3slog(5d)] . (81) 



Proof: 
From Lemma 



we can upper bound Rj by Qj . Conse- 
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quently d79l is upper bounded by the following. 

Qo , /5\ / Qo 



" ' 3 



pog 2 si -2 

n I 

3=1 
3 [~log 2 g] - 1 



'2vr 



— \ <?o 

d) x 



'2vr 



d I x 



/2vr 



Q riog 2 «] - 



(82) 



Now we use the property that qj + rj = 2 J for j = 
1 , . . . , [log 2 s] — 1 from Lemma 12.51 to bound d82l by the 
following. 



flog 2 s]-l 

n 

3=0 



<2tt 



Qj 



(83) 



We have a strict upper bound when rn g 2 a]-i 7^ 0, which 
occurs when s is not a power of 2, because then by qj+Tj = V 
we have q^ og2 s ^_i + rp log2 s -|_! = 2l" lo «= "l" 1 . In fact <J83> is 
an overestimate for a large s which is not a power of 2. 

Note Q 3 = by Lemma O Thus \^f \ = \^] and 



2 



— [2^1 ■ ^° we bound (l83l by the following. 



power 3/2 in the outside and this gives the following. 



'25\/27r\ 



nig 



3=0 



3/2 



25V2^ 
16 



25V2^ 
16 



(sd) E -=° 2 



log, s . v 2 1 



n (I 



3/2 



(sd) 



^log 2 > ., 

2s -i 1 n^=° J 



3/2 



(86) 



(87) 



(88) 



From (|86 ) to (|87 ) we evaluate the power of the first factor 
which is a geometric series and we again use the rule of indices 
for the sd factor. Then from (|87| | to d88l we use the indices' 
rule for the last factor and evaluate the power of the sd factor 
which is also a geometric series. We simplify the power of the 
last factor by using the following. 



fc=i 



k-2 k 



(m - 1) • 2 m+i + 2 



(89) 



riog 2 s]-i 

n 

3=0 



23+1 



2tt 



2-'- 



(84) 



Next we upper bound [log 2 s] — 1 in the limit of the 
product by log 2 s and upper bound [ojTt! by 2^" + ? = 
~fr + ^i - ) ' we a l so move the d into the square root and 
combined the constants to have the following bound on 



log 2 s 

n 

3=0 L 



2^ 



2 3+i 



25V2tt 
16 



2^ 



d 3 



We bound ^1 + \ by 2 to bound the above by 



log 2 s 

n 

3=0 



s / 25V2tt 
2J 7 I 16 



^ 3 
2J 



n 

3=0 



2-' 



25V2tt 



16 



s 3 d 3 



2 3 J 



(85) 



where we moved s/2 J into the square root. Using the rule 
of indices the product of the constant term is replaced by it's 
power to sum of the indices. We then rearranged to have the 



This therefore simplifies ( 1881 ) as follows. 



25V2tt 



2s- 1 



16 / 

16 

'25^2^ 



(sd) 



2s-l 



(log 2 s-l)-2 log 2 s + !+2' 



16 

'25^/277" 
16 

'25\/2T" 



(sd) 



(sd) 2s 
4sd 



16 



1 \ 2s(log 2 s-1) -, 
2s-l I 1 \ 1 



27 4 

-1 3/2 



3/2 



16 



25V2t7 



16 

25^/2^ 



2~ 2s log 2 S22s 



(2^) 2s s - 2s ] 3/2 



4sd 



(2d) 



2 s 



4sd 



3/2 



3/2 

(90) 
(91) 

(92) 

(93) 

(94) 



From d90l ) through d92l we simplified using basic properties 
of indices and logarithms. While from d92l to ( |93l we incor- 
porated 2 2s into the first factor inside the square brackets and 
we rewrote the first factor into a product of a power in s and 
another without s. From (|93l to (|94l the s 2s and s~ 2s canceled 
out. 

Now we expand the square brackets in d94l i to have 
below. 
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B. Main Corollaries 



/ 25y^ 
2 



16 



1 



25^2^/ 8VW 



(2d) 



3 s 



2.s 



(2d) 



3 s 



25V2irs 3 d 3 



< 



2bV2irs 3 d 3 
2 



■ exp ^3s log ^2 

• exp [3s log(5d)] 




(95) 

(96) 

(97) 
(98) 



2/3 



25V27rs 3 d 3 

From d95l l to (|96"1 > we simplified and from d96l > to dWt we 
rewrote the powers as an exponential with a logarithmic expo 

nent. Then from d97]) to (|98]> we upper bounded 2 (^^p^ 
by 5 which gives the required format of a product of a 
polynomial and an exponential to conclude the proof the 
corollary. ■ 
With the bound in Corollary 14.21 we have completed 
the bounding of II (l s , . . . , h, d) in ( 177b . Next we bound 
*$> n (a s , . . . , a,2, d) which is given by ( T78T >. Lemma 12.51 gives 
the three arguments for each ip n {-) and the number of ^Vi(-) 
with the same arguments. Using this lemma we express 
(a s , . ..,a 2 ,d) as 



rio g2 ( S )i-2 

E 

3=0 



Qi,avQ£\ ,a\ qj_\ 



9riog 2 (s)l-i • i>n {a 2 ,d,d) . (99) 

Equation ( l99t is bounded above in Lemma |2~71 by the follow- 
ing. 

ri°g2( s )i- 1 / \ 

^2 v ■ ^n\a Qjl ay^,ayRj_^y (100) 



If we let the a-n = aq j and a, = a ^ rj j we have ( llOOl i equal 
to the following. 



[s/2] \s/2\ 



+ (n — a,) • H 



i=l 



di ■ H 



02( 



/a, 
HI - 

V rt 



(101) 



n — di / \ n . 

Now we combine the bound of n (Z s , . . . , I2, d) in (T8TT > and the 
exponential whose exponent is the bound of ^> n (a s , . . . , 02 , d) 
in ( 11011 ) to get ©, the polynomial p ma x{s,d) = ^JL_, 
and (0, the exponent of the exponential ^> (a s , . . . , d) which 
is given by the sum of 3s log (5d) and the right hand side of 
([TOlI 

Lemma 12.81 gives the a% that maximize (11011 ) and the 
systems (1531 and (1541 ) they satisfy depending on the constraints 
on a s . Solving completely the system ((53) gives a, in (|52l and 
(3]i which are the expected values of the a^. The system (f5]l 
is equivalent to d54l ) hence also proven in Lemma 12.81 This 
therefore concludes the proof Theorem 11.61 



In this section we present the proofs of the corollaries in 
Sections II- Al and II-BI These include the proof of Corollary 
[L7l in Section IIV-BU the proof of Corollary 11.101 in Section 
IIV-B2l and the proof of Corollary [TTT] given in Section HV^B3l 

1) Corollary \1.7[ Satisfying RIP-1 means that for any 
s— sparse vector x, \\Agx\\i > (1 — 2e)d||a;|| 1 which indicates 
that the cardinality of the set of neighbors satisfies |^4 S | > 
(1 — e)ds. Therefore 



ProbdlAsxHx < (l-2e)d||s||i) 

= Prob(|^ s 



< (1 - e)ds) . (102) 



This implies that a s = (1 — e)ds and since this is restricting 
a s to be less than it's expected value given by (0]), the rest 
of the a. L satisfy the polynomial system (0. If there exists a 
solution then the a, t would be functions of s, d and e which 
makes \& (a s , . . . , a%, d) = ^ (s, d, e). 

2) Corollary 17.701 Corollary [T/7J states that by fixing S 
and the other parameters, Prob (||A,gx||i < (1 — 2e)d||x||i) < 
Pmax{s,d) ■ exp [n ■ (s, d, e)}. Corollary 11.101 considers 
any S C [N] and since the matrices are adjacency 
matrices of lossless expanders we need to consider any 
S C [N] such that |5| < k. Therefore our target is 
Prob (|| Ax\\ 1 < (1 — 2e)d||a;|| 1) which is bounded by a simple 
union bound over all ( ) S sets and by treating each set 
S, of cardinality less than k, independent we sum over this 
probability to get the following bound. 



£r)-Vrob(\\A s x\\ 1 <(l-2e)d\\x\\ 1 ) 
< ^ ( N ^j ■ Pmax{s, d) ■ exp [n ■ * (s, d, e)] 

1 



(103) 
(104) 



fc / r \ 2 



s=2 



2ns(l-f) 



' Pmax 

(s,d) 



exp [^VH(-^) +n-*{s,d,e) 



(105) 



< k 



x exp 



Pmax {k, d) 

.V|H|^)+^.*(M,e: 



(106) 



From ( 1103b to ( 11041 ) we bound the probability in ( 1103b using 
Corollary [T7] Then from ( fT04b to ( fT05T > we bound (^) using 
Stirling's formula d64b by a polynomial in N multiplying 
Pmax(s,d) and an exponential incorporated into the exponent 
of the exponential term. From (11051 ) to (11061 ) we use that 
for N > 2k the entropy H (-^-) is largest when s = k 
and we bound the summation by taking the maximum value 
of s and multiplying by the number of terms plus one, 
giving k, in the summation. This gives p' max (N,k,d) = 



Mi)' 



„(fc,<2) 



which simplifies to 



and 



the factor *„ et (k, n, N; d, e) = H (-|) + f • *(A,d,e) is 
what is multiplied to N in the exponent as claimed. 
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3) Corollary 17.771 Corollary 11.101 has given us an upper 
bound on the probability Prob < (1 — 2e)<f||x||i) in 

(O. In this bound the exponential dominates the polyno- 
mial. Consequently, in the limit as (k, n, N) —> oo while 
k/n -> p e (0, 1) and n/N -> 8 G (0, 1) this bound 
has a sharp transition at the zero level curve of ^ n et- For 
^ ne t (k, n, N; d, e) strictly bounded above zero the overall 
bound grows exponentially in N without limit, while for 
^net (k, n, N; d, e) strictly bounded below zero the over- 
all bound decays to zero exponentially quickly. We define 
p exp (S;d,e) to satisfy * net (k, n, N; d, e) = in (O, so 
that for any p strictly less than p exp (8; d, e) the exponent will 
satisfy ^ ne t (k, n, N; d, e) < and hence the bound decay to 
zero. 

More precisely, for k/n — >• p < (1 — r y)p exp (8; d, e) with 
small 7 > 0, in this regime of p < (1 — j)p exp (8; d, e) 
we have Prob (||Aa;||i < (1 - 2e)d||ar||i) -> 0. Therefore, 
Prob (|| Ax\\ i > (1 — 2e)d||.T|| i) — > 1 as the problem size 
grows such that (k,n,N) — > oo, n/N — > 5 € (0,1) and 
k/n — > p. 

V. Appendix 



A. Proof of Corollary \2.2\ 

The first part of this proof uses ideas from the proof of 
Proposition 12.11 which is the same as Theorem 16 in |26|. 
We consider a bipartite graph G(U,V,E) with \U\ — N left 
vertices, |V| = n right vertices and left degree d. For a fixed 
S C U where \S\ — s < k, G fails to be an expander on 
S if II^S 1 )! < (1 — e)ds. This means that in a sequence of 
ds vertex indices at least eds of the these indices are in the 
collision set that is identical to some preceding value in the 
sequence. 

Therefore, the probability that a neighbor chosen uniformly 
at random is to be in the collision set is at most ds/n and, 
treating each event independently, then the probability that a 
set of eds neighbors chosen at random are in the collision set 
is at most (ds/n) eds . There are ( e ^ s s ) ways of choosing a set of 
eds points from a set of ds points and ( ) ways of choosing 
each set S from U . This means therefore that the probability 
that G fails to expand in at least one of the sets S of fixed 
size s can be bounded above by a union bound 



Prob (G fails to expand on S) 



N 



s J I eds) V n 



(107) 



We define p s to be the right hand side of ( 1107b and we use 
the right hand side of the Stirling's inequality d64t to upper 
bound p s as thus 



Ps< l 



„ eds ( eds . 
2tt— 1 — | eds 

ds V ds 



exp 



2tt— (1- —)N 



x exp 



NH 



ds 



(108) 



Writing the last multiplicand of ( 11081 ) in exponential form and 
simplifying the expression gives 



Ps < Pmax (N, s; d, e) ■ exp [N ■ ^ (s, n, N; d, e)] , (109) 
where ^ (s, n, N; d, e) is 

ds , , eds , / ds^ 



/ s \ ds , , eds i ds \ 

h (n) + n h ^ + ^{^)> (110) 

and p max (N,s;d,e) is a polynomial in iV and s for each d 
and e fixed given by 



1 

2tts 



N 



e(l - e)(N - s)d 



(111) 



Finally G fails to be an expander if it fails to expand on at 
least one set S of any size s < k. This means therefore that 

k 

Prob (G fails to be an expander) < p s . (112) 

8 = 1 

From ( 1109t we have X)s=2P s bounded by 
k 

^2p ma x(N,s;d,e)-exp[N ■■f(s,n,N;d,e)} (113) 

< P'max (N, k; d, e) ■ exp [N ■ * (k, n, N; d, e)] , (1 14) 

where p' max (N, k; d, e) — k-p max (N, k; d, e) and we achieved 
the bound from ( 11 131 ) to ( 11 14b by upper bounding the sum with 
the product of the largest term in the sum (which is when 
s = k since k < N/2) and one plus the number of terms in 
the sum, giving k. Hence from (11 12b and (11 141 ) we have 

Prob (G fails to be an expander) < p' max (N, k; d, e) 

x exp [N ■ "J (fc, n, N; d, e)] . (115) 

As the problem size, (A;, n, N), grows the exponential term 
will be driving the probability in ( 11 151 ), hence having 



(116) 



* (k,n,N;d,e) < 



yields Prob (G fails to be an expander) — s- as the problem 
size (k, n, N) —> oo. 

Let k/n -t p e (0,1) and n/N ->• 5 e (0,1) as 
(k,n,N) —> oo and we define p^ xp (S;d,e) as the limiting 
value of k/n that satisfies *B (k,n, N;d,e) = for each 
fixed e and d and all 8. Note that for fixed e, d and 8 it 
is deducible from our analysis of ip n {-) in Section Hl-BI that 
^ (k, n, N; d, e) is a strictly monotonically increasing function 
of k/n. Therefore for any p < pl* p , \& (k,n,N;d,e) < 
as (k,n,N) — > oo, Prob (G fails to be an expander) — > 
and G becomes an expander with probability approaching one 
exponentially in N which is the same as exponential growth 
in n since n —> N p. 
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